AITopics | finite-time regret

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a $K$-armed bandit with exponential family rewards, ExpTS over a horizon $T$ is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor $\sqrt{\log K}$, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS$^+$, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms. ExpTS$^+$ is an anytime bandit algorithm and achieves the minimax optimality and asymptotic optimality simultaneously for exponential family reward distributions. Our proof techniques are general and conceptually simple and can be easily applied to analyze standard Thompson sampling with specific reward distributions.

finite-time regret, reward distribution, thompson sampling algorithm, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.63)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsAug-22-2025, 02:08:27 GMT

We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.82)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsAug-19-2025, 21:34:54 GMT

We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound.

artificial intelligence, data mining, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.05)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.84)

Add feedback

Review for NeurIPS paper: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Neural Information Processing SystemsJan-21-2025, 16:22:22 GMT

Additional Feedback: I read the paper with interest, but got a bit disappointed in the end. Asymptotic optimality seems to be the focus of the paper, and this is the point I disagree with. Certainly, having asymptotic optimality is good, but only performing well on that--rather than finite-time optimality--is not enough given that linear contextual bandits have been studied extensively. In particular, a simple epsilon-greedy algorithm with epsilon decreasing to 0 at an appropriate rate is already asymptotically optimal. So in my view, finite-time regret must be the clear performance metric for evaluating an algorithm.

asymptotically optimal primal-dual incremental algorithm, contextual linear bandit, finite-time regret, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Neural Information Processing SystemsJan-19-2025, 08:09:22 GMT

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a K -armed bandit with exponential family rewards, ExpTS over a horizon T is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor \sqrt{\log K}, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms.

exponential family multi-armed bandit, reward distribution, thompson sampling algorithm, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.54)

Add feedback

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Jin, Tianyuan, Xu, Pan, Xiao, Xiaokui, Anandkumar, Anima

arXiv.org Machine LearningJun-7-2022

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the finite-time regret bound as well as the asymptotic regret bound. In particular, for a $K$-armed bandit with exponential family rewards, ExpTS over a horizon $T$ is sub-UCB (a strong criterion for the finite-time regret that is problem-dependent), minimax optimal up to a factor $\sqrt{\log K}$, and asymptotically optimal, for exponential family rewards. Moreover, we propose ExpTS$^+$, by adding a greedy exploitation step in addition to the sampling distribution used in ExpTS, to avoid the over-estimation of sub-optimal arms. ExpTS$^+$ is an anytime bandit algorithm and achieves the minimax optimality and asymptotic optimality simultaneously for exponential family reward distributions. Our proof techniques are general and conceptually simple and can be easily applied to analyze standard Thompson sampling with specific reward distributions.

artificial intelligence, big data, data mining, (3 more...)

arXiv.org Machine Learning

2206.0352

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.69)

Add feedback

Filters

Collaborating Authors

finite-time regret

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Finite-TimeRegretofThompsonSampling AlgorithmsforExponentialFamilyMulti-Armed Bandits

fb23cf87a9e04d7677b73c47acd060ef-Paper-Conference.pdf

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Review for NeurIPS paper: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits